Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 669
Filtrar
1.
Nature ; 621(7978): 344-354, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37612512

RESUMO

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.


Assuntos
Cromossomos Humanos Y , Genômica , Análise de Sequência de DNA , Humanos , Sequência de Bases , Cromossomos Humanos Y/genética , DNA Satélite/genética , Variação Genética/genética , Genética Populacional , Genômica/métodos , Genômica/normas , Heterocromatina/genética , Família Multigênica/genética , Padrões de Referência , Duplicações Segmentares Genômicas/genética , Análise de Sequência de DNA/normas , Sequências de Repetição em Tandem/genética , Telômero/genética
3.
JAMA ; 330(3): 205-206, 2023 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-37379037

RESUMO

This Medical News article discusses the Human Pangenome Project.


Assuntos
Genoma Humano , Genômica , Medicina , Humanos , Genoma Humano/genética , Genômica/normas , Medicina/tendências
4.
Nature ; 617(7960): 312-324, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37165242

RESUMO

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.


Assuntos
Genoma Humano , Genômica , Humanos , Diploide , Genoma Humano/genética , Haplótipos/genética , Análise de Sequência de DNA , Genômica/normas , Padrões de Referência , Estudos de Coortes , Alelos , Variação Genética
6.
BMC Genomics ; 24(1): 117, 2023 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-36927511

RESUMO

BACKGROUND: Generating the most contiguous, accurate genome assemblies given available sequencing technologies is a long-standing challenge in genome science. With the rise of long-read sequencing, assembly challenges have shifted from merely increasing contiguity to correctly assembling complex, repetitive regions of interest, ideally in a phased manner. At present, researchers largely choose between two types of long read data: longer, but less accurate sequences, or highly accurate, but shorter reads (i.e., >Q20 or 99% accurate). To better understand how these types of long-read data as well as scale of data (i.e., mean length and sequencing depth) influence genome assembly outcomes, we compared genome assemblies for a caddisfly, Hesperophylax magnus, generated with longer, but less accurate, Oxford Nanopore (ONT) R9.4.1 and highly accurate PacBio HiFi (HiFi) data. Next, we expanded this comparison to consider the influence of highly accurate long-read sequence data on genome assemblies across 6750 plant and animal genomes. For this broader comparison, we used HiFi data as a surrogate for highly accurate long-reads broadly as we could identify when they were used from GenBank metadata. RESULTS: HiFi reads outperformed ONT reads in all assembly metrics tested for the caddisfly data set and allowed for accurate assembly of the repetitive ~ 20 Kb H-fibroin gene. Across plants and animals, genome assemblies that incorporated HiFi reads were also more contiguous. For plants, the average HiFi assembly was 501% more contiguous (mean contig N50 = 20.5 Mb) than those generated with any other long-read data (mean contig N50 = 4.1 Mb). For animals, HiFi assemblies were 226% more contiguous (mean contig N50 = 20.9 Mb) versus other long-read assemblies (mean contig N50 = 9.3 Mb). In plants, we also found limited evidence that HiFi may offer a unique solution for overcoming genomic complexity that scales with assembly size. CONCLUSIONS: Highly accurate long-reads generated with HiFi or analogous technologies represent a key tool for maximizing genome assembly quality for a wide swath of plants and animals. This finding is particularly important when resources only allow for one type of sequencing data to be generated. Ultimately, to realize the promise of biodiversity genomics, we call for greater uptake of highly accurate long-reads in future studies.


Assuntos
Biodiversidade , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Genômica/métodos , Genômica/normas , Genômica/tendências , Insetos/classificação , Insetos/genética , Fibroínas/genética , Mapeamento de Sequências Contíguas , Genoma de Inseto/genética , Animais , Bases de Dados de Ácidos Nucleicos , Reprodutibilidade dos Testes , Metanálise como Assunto , Conjuntos de Dados como Assunto , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Sequenciamento de Nucleotídeos em Larga Escala/tendências , Plantas/genética , Genoma de Planta/genética
7.
Trends Genet ; 39(3): 175-186, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36402623

RESUMO

Quality control is essential for genome assemblies; however, a consensus has yet to be reached on what metrics should be adopted for the evaluation of assembly quality. N50 is widely used for contiguity measurement, but its effectiveness is constantly in question. Prevailing metrics for the completeness evaluation focus on gene space, yet challenging areas such as tandem repeats are commonly overlooked. Achieving correctness has become an indispensable dimension for quality control, while prevailing assembly releases lack scores reflecting this aspect. We propose a metric set with a set of statistic indexes for effective, comprehensive evaluation of assemblies and provide a score of a finished assembly for each metric, which can be utilized as a benchmark for achieving high-quality genome assemblies.


Assuntos
Genômica , Análise de Sequência de DNA , Análise de Sequência de DNA/métodos , Genômica/normas
8.
Nature ; 611(7936): 519-531, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36261518

RESUMO

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Assuntos
Mapeamento Cromossômico , Diploide , Genoma Humano , Genômica , Humanos , Mapeamento Cromossômico/normas , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Padrões de Referência , Genômica/métodos , Genômica/normas , Cromossomos Humanos/genética , Variação Genética/genética
10.
Science ; 376(6588): eabj5089, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35357915

RESUMO

The completion of a telomere-to-telomere human reference genome, T2T-CHM13, has resolved complex regions of the genome, including repetitive and homologous regions. Here, we present a high-resolution epigenetic study of previously unresolved sequences, representing entire acrocentric chromosome short arms, gene family expansions, and a diverse collection of repeat classes. This resource precisely maps CpG methylation (32.28 million CpGs), DNA accessibility, and short-read datasets (166,058 previously unresolved chromatin immunoprecipitation sequencing peaks) to provide evidence of activity across previously unidentified or corrected genes and reveals clinically relevant paralog-specific regulation. Probing CpG methylation across human centromeres from six diverse individuals generated an estimate of variability in kinetochore localization. This analysis provides a framework with which to investigate the most elusive regions of the human genome, granting insights into epigenetic regulation.


Assuntos
Ilhas de CpG , Metilação de DNA , Epigênese Genética , Genoma Humano , Centrômero/genética , Centrômero/metabolismo , Doença/genética , Loci Gênicos , Genômica/normas , Humanos , Padrões de Referência , Análise de Sequência de DNA
11.
Science ; 376(6588): eabl3533, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35357935

RESUMO

Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics.


Assuntos
Variação Genética , Genoma Humano , Genômica/normas , Análise de Sequência de DNA/normas , Humanos , Padrões de Referência
12.
Cancer Cell ; 40(2): 109-113, 2022 02 14.
Artigo em Inglês | MEDLINE | ID: mdl-35120599

RESUMO

Cancers other than breast, colorectal, cervical, and lung do not have guideline-recommended screening. New multi-cancer early detection (MCED) tests-using a single blood sample-have been developed based on circulating cell-free DNA (cfDNA) or other analytes. In this commentary, we review the current evidence on these tests, provide several major considerations for new MCED tests, and outline how their evaluation will need to differ from that established for traditional single-cancer screening tests.


Assuntos
Biomarcadores Tumorais , Detecção Precoce de Câncer , Genômica/métodos , Neoplasias/diagnóstico , Neoplasias/genética , Tomada de Decisão Clínica , Gerenciamento Clínico , Suscetibilidade a Doenças , Detecção Precoce de Câncer/métodos , Detecção Precoce de Câncer/normas , Genômica/normas , Humanos , Especificidade de Órgãos
13.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-35042802

RESUMO

A global international initiative, such as the Earth BioGenome Project (EBP), requires both agreement and coordination on standards to ensure that the collective effort generates rapid progress toward its goals. To this end, the EBP initiated five technical standards committees comprising volunteer members from the global genomics scientific community: Sample Collection and Processing, Sequencing and Assembly, Annotation, Analysis, and IT and Informatics. The current versions of the resulting standards documents are available on the EBP website, with the recognition that opportunities, technologies, and challenges may improve or change in the future, requiring flexibility for the EBP to meet its goals. Here, we describe some highlights from the proposed standards, and areas where additional challenges will need to be met.


Assuntos
Sequência de Bases/genética , Eucariotos/genética , Genômica/normas , Animais , Biodiversidade , Genômica/métodos , Humanos , Padrões de Referência , Valores de Referência , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas
14.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-35042806

RESUMO

Globally, 15,521 animal species are listed as threatened by the International Union for the Conservation of Nature, and of these less than 3% have genomic resources that can inform conservation management. To combat this, global genome initiatives are developing genomic resources, yet production of a reference genome alone does not conserve a species. The reference genome allows us to develop a suite of tools to understand both genome-wide and functional diversity within and between species. Conservation practitioners can use these tools to inform their decision-making. But, at present there is an implementation gap between the release of genome information and the use of genomic data in applied conservation by conservation practitioners. In May 2020, we launched the Threatened Species Initiative and brought a consortium of genome biologists, population biologists, bioinformaticians, population geneticists, and ecologists together with conservation agencies across Australia, including government, zoos, and nongovernment organizations. Our objective is to create a foundation of genomic data to advance our understanding of key Australian threatened species, and ultimately empower conservation practitioners to access and apply genomic data to their decision-making processes through a web-based portal. Currently, we are developing genomic resources for 61 threatened species from a range of taxa, across Australia, with more than 130 collaborators from government, academia, and conservation organizations. Developed in direct consultation with government threatened-species managers and other conservation practitioners, herein we present our framework for meeting their needs and our systematic approach to integrating genomics into threatened species recovery.


Assuntos
Conservação dos Recursos Naturais/métodos , Espécies em Perigo de Extinção/legislação & jurisprudência , Genômica/normas , Animais , Coleta de Dados , Espécies em Perigo de Extinção/tendências , Genoma , Genômica/legislação & jurisprudência , Genômica/métodos , Governo
15.
Mol Genet Genomics ; 297(1): 33-46, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34755217

RESUMO

Based on molecular markers, genomic prediction enables us to speed up breeding schemes and increase the response to selection. There are several high-throughput genotyping platforms able to deliver thousands of molecular markers for genomic study purposes. However, even though its widely applied in plant breeding, species without a reference genome cannot fully benefit from genomic tools and modern breeding schemes. We used a method to assemble a population-tailored mock genome to call single-nucleotide polymorphism (SNP) markers without an available reference genome, and for the first time, we compared the results with standard genotyping platforms (array and genotyping-by-sequencing (GBS) using a reference genome) for performance in genomic prediction models. Our results indicate that using a population-tailored mock genome to call SNP delivers reliable estimates for the genomic relationship between genotypes. Furthermore, genomic prediction estimates were comparable to standard approaches, especially when considering only additive effects. However, mock genomes were slightly worse than arrays at predicting traits influenced by dominance effects, but still performed as well as standard GBS methods that use a reference genome. Nevertheless, the array-based SNP markers methods achieved the best predictive ability and reliability to estimate variance components. Overall, the mock genomes can be a worthy alternative for genomic selection studies, especially for those species where the reference genome is not available.


Assuntos
Biologia Computacional , Técnicas de Genotipagem , Modelos Genéticos , Animais , Quimera/genética , Biologia Computacional/métodos , Biologia Computacional/normas , Conjuntos de Dados como Assunto , Genoma , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Genômica/métodos , Genômica/normas , Genótipo , Técnicas de Genotipagem/métodos , Técnicas de Genotipagem/normas , Fenótipo , Padrões de Referência , Reprodutibilidade dos Testes , Seleção Genética , Especificidade da Espécie , Zea mays/classificação , Zea mays/genética
16.
Nucleic Acids Res ; 50(D1): D1468-D1474, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34747486

RESUMO

PLAZA is a platform for comparative, evolutionary, and functional plant genomics. It makes a broad set of genomes, data types and analysis tools available to researchers through a user-friendly website, an API, and bulk downloads. In this latest release of the PLAZA platform, we are integrating a record number of 134 high-quality plant genomes, split up over two instances: PLAZA Dicots 5.0 and PLAZA Monocots 5.0. This number of genomes corresponds with a massive expansion in the number of available species when compared to PLAZA 4.0, which offered access to 71 species, a 89% overall increase. The PLAZA 5.0 release contains information for 5 882 730 genes, and offers pre-computed gene families and phylogenetic trees for 5 274 684 protein-coding genes. This latest release also comes with a set of new and updated features: a new BED import functionality for the workbench, improved interactive visualizations for functional enrichments and genome-wide mapping of gene sets, and a fully redesigned and extended API. Taken together, this new version offers extended support for plant biologists working on different families within the green plant lineage and provides an efficient and versatile toolbox for plant genomics. All PLAZA releases are accessible from the portal website: https://bioinformatics.psb.ugent.be/plaza/.


Assuntos
Evolução Biológica , Bases de Dados Genéticas , Plantas/classificação , Software , Genoma de Planta/genética , Genômica/normas , Anotação de Sequência Molecular , Família Multigênica/genética , Filogenia , Plantas/genética
17.
Nat Rev Genet ; 23(3): 169-181, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-34837041

RESUMO

The scale of genetic, epigenomic, transcriptomic, cheminformatic and proteomic data available today, coupled with easy-to-use machine learning (ML) toolkits, has propelled the application of supervised learning in genomics research. However, the assumptions behind the statistical models and performance evaluations in ML software frequently are not met in biological systems. In this Review, we illustrate the impact of several common pitfalls encountered when applying supervised ML in genomics. We explore how the structure of genomics data can bias performance evaluations and predictions. To address the challenges associated with applying cutting-edge ML methods to genomics, we describe solutions and appropriate use cases where ML modelling shows great potential.


Assuntos
Genômica/métodos , Aprendizado de Máquina , Animais , Genômica/normas , Genômica/tendências , Humanos , Aprendizado de Máquina/normas , Modelos Estatísticos , Software
18.
Genes (Basel) ; 12(12)2021 11 25.
Artigo em Inglês | MEDLINE | ID: mdl-34946832

RESUMO

Variant interpretation is challenging as it involves combining different levels of evidence in order to evaluate the role of a specific variant in the context of a patient's disease. Many in-depth refinements followed the original 2015 American College of Medical Genetics (ACMG) guidelines to overcome subjective interpretation of criteria and classification inconsistencies. Here, we developed an ACMG-based classifier that retrieves information for variant interpretation from the VarSome Stable-API environment and allows molecular geneticists involved in clinical reporting to introduce the necessary changes to criterion strength and to add or exclude criteria assigned automatically, ultimately leading to the final variant classification. We also developed a modified ACMG checklist to assist molecular geneticists in adjusting criterion strength and in adding literature-retrieved or patient-specific information, when available. The proposed classifier is an example of integration of automation and human expertise in variant curation, while maintaining the laboratory analytical workflow and the established bioinformatics pipeline.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Genômica/normas , Biologia Computacional/normas , Testes Genéticos/normas , Humanos
19.
Eur Rev Med Pharmacol Sci ; 25(1 Suppl): 1-6, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34890028

RESUMO

OBJECTIVE: While the bioinformatic workflow, from quality control to annotation, is quite standardized, the interpretation of variants is still a challenge. The decreasing cost of massively parallel NGS has produced hundreds of variants per patient to analyze and interpret. The ACMG "Standards and guidelines for the interpretation of sequence variants", widely adopted in clinical settings, assume that the clinician has a comprehensive knowledge of the literature and the disease. MATERIALS AND METHODS: To semi-automatize the application of the guidelines, we decided to develop an algorithm that exploits VarSome, a widely used platform that interprets variants on the basis of information from more than 70 genome databases. RESULTS: Here we explain how we integrated VarSome API into our existing clinical diagnostic pipeline for NGS data to obtain validated reproducible results as indicated by accuracy, sensitivity and specificity. CONCLUSIONS: We validated the automated pipeline to be sure that it was doing what we expected. We obtained 100% sensitivity, specificity and accuracy, confirming that it was suitable for use in a diagnostic setting.


Assuntos
Algoritmos , Variação Genética/genética , Genômica/normas , Sequenciamento de Nucleotídeos em Larga Escala/normas , Guias de Prática Clínica como Assunto/normas , Ferramenta de Busca/normas , Biologia Computacional/métodos , Biologia Computacional/normas , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Ferramenta de Busca/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas
20.
Am J Hum Genet ; 108(12): 2224-2237, 2021 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-34752750

RESUMO

Over 100 million research participants around the world have had research array-based genotyping (GT) or genome sequencing (GS), but only a small fraction of these have been offered return of actionable genomic findings (gRoR). Between 2017 and 2021, we analyzed genomic results from 36,417 participants in the Mass General Brigham Biobank and offered to confirm and return pathogenic and likely pathogenic variants (PLPVs) in 59 genes. Variant verification prior to participant recontact revealed that GT falsely identified PLPVs in 44.9% of samples, and GT failed to identify 72.0% of PLPVs detected in a subset of samples that were also sequenced. GT and GS detected verified PLPVs in 1% and 2.5% of the cohort, respectively. Of 256 participants who were alerted that they carried actionable PLPVs, 37.5% actively or passively declined further disclosure. 76.3% of those carrying PLPVs were unaware that they were carrying the variant, and over half of those met published professional criteria for genetic testing but had never been tested. This gRoR protocol cost approximately $129,000 USD per year in laboratory testing and research staff support, representing $14 per participant whose DNA was analyzed or $3,224 per participant in whom a PLPV was confirmed and disclosed. These data provide logistical details around gRoR that could help other investigators planning to return genomic results.


Assuntos
Bancos de Espécimes Biológicos , Doença/genética , Variação Genética , Genoma Humano , Genômica , Adulto , Estudos de Coortes , DNA , Revelação , Dever de Recontatar , Feminino , Pesquisa em Genética , Testes Genéticos , Genômica/economia , Genômica/normas , Genômica/tendências , Humanos , Consentimento Livre e Esclarecido , Masculino , Pessoa de Meia-Idade , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...